Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
نویسندگان
چکیده
Compounding, the process of combining several simplex words into a complex whole, is a productive process in a wide range of languages. In particular, concatenative compounding, in which the components are “glued” together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro project, which focuses on compounding in the closely related languages Afrikaans and Dutch. The project consists of subprojects focusing on compound splitting (identifying the boundaries of the components) and compound semantics (identifying semantic relations between the components). We describe the developed datasets as well as results showing the effectiveness of the developed datasets.
منابع مشابه
The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Compounding is very productive and leads to practi...
متن کاملAutomatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch Wordsyoudontknow: Evaluation of Lexicon-based Decompounding with Unknown Handling Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation Modelling Regular Subcategorization Changes in German Particle Verbs
German particle verbs are a type of multi word expression which is often compositional with respect to a base verb. If they are compositional they tend to express the same types of semantic arguments, but they do not necessarily express them in the same syntactic subcategorization frame: some arguments may be expressed by differing syntactic subcategorization slots and other arguments may be on...
متن کاملClassification of Noun-Noun Compound Semantics in Dutch and Afrikaans
This article presents initial results on a supervised machine learning approach to determine the semantics of noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine method used for this classification experiment utili...
متن کاملMore Than Only Noun-Noun Compounds: Towards an Annotation Scheme for the Semantic Modelling of Other Noun Compound Types
The computational processing of compound semantics poses several interesting challenges. Up to now, the processing of nominal compounds with non-noun left-hand constituents (henceforth XN compounds) has not received any attention, despite the fact that these also seem to be rather productive in Germanic languages. In our research project, we aim to fill this hiatus by investigating various kind...
متن کاملAnnotation Guidelines forCompoundAnalysis
This technical report introduces three sets of annotation guidelines for the analysis of compounds in Afrikaans and Dutch. The first protocol serves the annotation of compound boundaries when creating a dataset to use for compound segmentation. The second and third protocol serve the semantic annotation of the relation between the constituents of compounds. Where the second protocol only focuse...
متن کامل